5 research outputs found

    Reliable state retention-based embedded processors through monitoring and recovery

    No full text
    State retention power gating and voltage-scaled state retention are two effective design techniques, commonly employed in embedded processors, for reducing idle circuit leakage power. This paper presents a methodology for improving the reliability of embedded processors in the presence of power supply noise and soft errors. A key feature of the method is low cost, which is achieved through reuse of the scan chain for state monitoring, and it is effective because it can correct single and multiple bit errors through hardware and software respectively. To validate the methodology, ARM Cortex-M0 embedded microprocessor (provided by our industrial project partner) is implemented in FPGA and further synthesized using 65-nm technology to quantify the cost in terms of area, latency and energy. It is shown that the proposed methodology has a small area overhead (8.6%) with less than 4% worst-case increase in critical path and is capable of detecting and correcting both single bit and multi bit errors for a wide range of fault rates

    MLPerf Inference Benchmark

    Full text link
    Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded. Over 100 organizations are building ML inference chips, and the systems that incorporate existing models span at least three orders of magnitude in power consumption and five orders of magnitude in performance; they range from embedded devices to data-center solutions. Fueling the hardware are a dozen or more software frameworks and libraries. The myriad combinations of ML hardware and ML software make assessing ML-system performance in an architecture-neutral, representative, and reproducible manner challenging. There is a clear need for industry-wide standard ML benchmarking and evaluation criteria. MLPerf Inference answers that call. In this paper, we present our benchmarking method for evaluating ML inference systems. Driven by more than 30 organizations as well as more than 200 ML engineers and practitioners, MLPerf prescribes a set of rules and best practices to ensure comparability across systems with wildly differing architectures. The first call for submissions garnered more than 600 reproducible inference-performance measurements from 14 organizations, representing over 30 systems that showcase a wide range of capabilities. The submissions attest to the benchmark's flexibility and adaptability.Comment: ISCA 202

    Scale-out processors

    Get PDF
    Scale-out datacenters mandate high per-server throughput to get the maximum benefit from the large TCO investment. Emerging applications (e.g., data serving and web search) that run in these datacenters operate on vast datasets that are not accommodated by on-die caches of existing server chips. Large caches reduce the die area available for cores and lower performance through long access latency when instructions are fetched. Performance on scale-out workloads is maximized through a modestly-sized last-level cache that captures the instruction footprint at the lowest possible access latency. In this work, we introduce a methodology for designing scalable and efficient scale-out server processors. Based on a metric of performance-density, we facilitate the design of optimal multi-core configurations, called pods. Each pod is a complete server that tightly couples a number of cores to a small last-level cache using a fast interconnect. Replicating the pod to fill the die area yields processors which have optimal performance density, leading to maximum per-chip throughput. Moreover, as each pod is a stand-alone server, scale-out processors avoid the expense of global (i.e., interpod) interconnect and coherence. These features synergistically maximize throughput, lower design complexity, and improve technology scalability. In 20nm technology, scaleout chips improve throughput by 5x-6.5x over conventional and by 1.6x-1.9x over emerging tiled organizations. 1
    corecore